Multi-Script Video Caption Localization Based on Visual Rhythms

نویسندگان

چکیده

Localization of video caption plays an important role in information retrieval multimedia applications. In this work, we present and evaluate a novel method for localizing captions using visual rhythms, which enable the representation analysis specific feature throughout time. We build rhythms from text location maps produced by general localization methods that are far more common literature than caption-oriented ones. Then, process properly to keep only captions, generating masks. To meet need standardized large dataset, constructed new one, where with thirteen different scripts added frames, total 221 videos ground truth. Experiments demonstrate our achieves competitive results when compared other approaches.

متن کامل

منابع مشابه

Automatic Caption Localization in Compressed Video

ÐWe present a method to automatically localize captions in JPEG compressed images and the I-frames of MPEG compressed videos. Caption text regions are segmented from background images using their distinguishing texture characteristics. Unlike previously published methods which fully decompress the video sequence before extracting the text regions, this method locates candidate caption text regi...

متن کامل

Integrating both Visual and Audio Cues for Enhanced Video Caption

Video caption refers to generating a descriptive sentence for a specific short video clip automatically, which has achieved remarkable success recently. However, most of the existing methods focus more on visual information while ignoring the synchronized audio cues. We propose three multimodal deep fusion strategies to maximize the benefits of visual-audio resonance information. The first one ...

متن کامل

Video Indexing and Automatic Caption Creation

This paper presents the design and implementation of a video indexing and automatic caption creation system. The system is able to extract audio from videos and to get the transcript directly from the audio file using the newly designed audio-to-text engine based on Hidden Markov Model (HMM). Transcripts can be edited and the corresponding time stamps are updated automatically. The video indexi...

متن کامل

Multi-Feature Based Visual Saliency Detection in Surveillance Video

The perception of video is different from that of image because of the motion information in video. Motion objects lead to the difference between two neighboring frames which is usually focused on. By far, most papers have contributed to image saliency but seldom to video saliency. Based on scene understanding, a new video saliency detection model with multi-features is proposed in this paper. ...

متن کامل

Predicting Visual Features from Text for Image and Video Caption Retrieval

This paper strives to find amidst a set of sentences the one best describing the content of a given image or video. Different from existing works, which rely on a joint subspace for their image and video caption retrieval, we propose to do so in a visual space exclusively. Apart from this conceptual novelty, we contribute Word2VisualVec, a deep neural network architecture that learns to predict...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Applied Artificial Intelligence

سال: 2022

ISSN: ['0883-9514', '1087-6545']

DOI: https://doi.org/10.1080/08839514.2022.2032926